Virtualizing GPU for OpenPose Application using GVirtuS

Making GPU-Rich Deep Learning Accessible on Non-GPU Devices

Abstract

Deep learning applications like OpenPose are heavily GPU-dependent due to the massive parallel computations they require. But what if you want to run such applications on lightweight edge devices or non-GPU machines? This is where GPU Virtualization comes into play. In this article, we explore what GPU virtualization is, why it is needed, how the GVirtuS framework makes it possible, and finally, how you can run OpenPose virtually using GVirtuS in real-time.

1. What is GPU Virtualization and Why is it Needed?

Modern AI applications demand immense computational power. CPUs, though powerful for general-purpose tasks, fall short when it comes to handling the highly parallel workloads of neural networks, image processing, and deep learning pipelines. GPUs, with their parallel architecture, are the go-to hardware for such workloads.

However, not every device has a dedicated GPU. Edge devices, laptops, and small servers often lack the GPU horsepower needed to run GPU-rich applications. Buying multiple GPU machines for distributed environments is also costly.

GPU Virtualization bridges this gap. It allows a non-GPU device to “borrow” GPU resources from a remote GPU-enabled server. By virtualizing the GPU, we make high-performance computing available to lightweight devices—unlocking flexibility, cost efficiency, and scalability in AI deployments.

2. Introduction to GVirtuS Framework

GVirtuS is an open-source GPU virtualization software that enables applications built with CUDA Toolkit v12.6+ to efficiently leverage GPU acceleration remotely. It provides partial or full virtualization for key CUDA libraries, including cuDNN, cuFFT, cuBLAS, cuSPARSE, and cuSOLVER.

GVirtuS follows a split-driver model, meaning the workload is separated between a non-GPU frontend and a GPU backend.

Frontend: Captures CUDA API calls made by the application running on a non-GPU device.
Backend: Executes those CUDA calls on the physical GPU hardware.
Communicator: Transfers data and instructions between frontend and backend (via TCP/IP, RDMA, etc.).

3. OpenPose Application and Why It’s GPU-Rich

OpenPose is one of the most popular human pose estimation frameworks. It detects body, hand, and facial keypoints from images and videos in real-time.

Under the hood, OpenPose uses the Caffe deep learning library, which is highly optimized for GPU execution. Since OpenPose involves processing high-resolution images, extracting features through deep convolutional networks, and running inference for multiple keypoints per frame, it becomes an extremely GPU-intensive application.

Running OpenPose on a CPU-only machine leads to slow performance, making real-time pose estimation nearly impossible. This is where GVirtuS becomes invaluable.

4. How GVirtuS Helps Virtualize OpenPose

GVirtuS virtualizes CUDA calls by splitting them between frontend and backend. Here’s how it works for OpenPose:

The GVirtuS Frontend hosts the OpenPose application on a non-GPU device.
When OpenPose makes CUDA calls, the frontend intercepts them.
The Communicator (such as TCP/IP) transparently forwards these calls to the backend.
The GVirtuS Backend, running on a GPU-enabled machine, executes the CUDA calls directly on the GPU hardware.
The results are sent back to the frontend, where the user sees OpenPose running smoothly on a non-GPU device.

5. Running OpenPose Virtualized with GVirtuS

Let’s walk through the real-time execution of OpenPose with GVirtuS.

Step 1: Download GVirtuS

git clone https://github.com/ecn-aau/GVirtuS.git
cd GVirtuS

Step 2: Start the Backend

make run-gvirtus-backend-dev

This launches the GVirtuS backend process on the GPU-enabled server.

Step 3: Run OpenPose on the Frontend

make run-openpose-test

This command triggers a Docker container, the 00_test.cpp script, and executes OpenPose with GPU virtualization. The results are displayed directly on the frontend (non-GPU device).

Conclusion

GPU Virtualization is transforming how we deploy deep learning applications, making GPU acceleration accessible from almost any device. By using GVirtuS, we can unlock GPU resources across distributed environments and enable GPU-intensive frameworks like OpenPose to run even on lightweight edge devices.

For more setup instructions, check my previous guide: 👉 GVirtuS + OpenPose Integration

Acknowledgment

This work has been funded by the Clever Project. Special thanks to Associate Professor Sokol Kosta for his guidance and support.